Operating processors over a network

ABSTRACT

A client processor can save an execution state of a process that runs on two or more secondary processors in a single file. The single file can be transferred from the client processor over a network to a host processor. The single file is configured to permit the host processor to resume processing of the suspended process. It is emphasized that this abstract is provided to comply with the rules requiring an abstract that will allow a searcher or other reader to quickly ascertain the subject matter of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims.

CLAIM OF PRIORITY

This application is a continuation of U.S. patent application Ser. No.11/238,086, filed Sep. 27, 2005 and entitled “OPERATING CELL PROCESSORSOVER A NETWORK”, the entire disclosures of which are incorporated hereinby reference.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is related to commonly-assigned U.S. patent applicationSer. No. 11/238,077, filed Sep. 27, 2005 and entitled “CELL PROCESSORMETHODS AND APPARATUS” to John P. Bates, Payton R. White and AttilaVass, now U.S. Pat. No. 8,141,076, the entire disclosures of which areincorporated herein by reference.

This application is also related to commonly-assigned U.S. patentapplication Ser. No. 11/238,095, filed Sep. 27, 2005, now U.S. Pat. No.7,522,168, the entire disclosures of which are incorporated herein byreference.

This application is related to commonly-assigned co-pending U.S. patentapplication Ser. No. 11/238,087, filed Sep. 27, 2005, now U.S. Pat. No.8,037,474, the entire disclosures of which are incorporated herein byreference.

This application is also related to commonly-assigned U.S. patentapplication Ser. No. 11/238,085, filed Sep. 27, 2005, now U.S. Pat. No.7,506,123, the entire disclosures of which are incorporated herein byreference.

FIELD OF THE INVENTION

Embodiments of the present invention are directed cell processors andmore particularly to operating multiple cell processors over a network.

BACKGROUND OF THE INVENTION

Cell processors are a type of microprocessor that utilizes parallelprocessing. The basic configuration of a cell processor includes a“Power Processor Element” (“PPE”) (sometimes called “ProcessingElement”, or “PE”), and multiple “Synergistic Processing Elements”(“SPE”). The PPEs and SPEs are linked together by an internal high speedbus dubbed “Element Interconnect Bus” (“EIB”). Cell processors aredesigned to be scalable for use in applications ranging from the handheld devices to main frame computers.

In certain cell processors, the SPEs provide a monolithic executionenvironment. Each SPE has a well isolated execution set or context thatfacilitates portability and network transparency of applications runningon the cell processor. Such portable SPE applications have been calledSPUlets or APUlets. However, there are disadvantages associated with theidentical execution environment sizes for the SPUlets. Specifically,SPUlets only come in a single grain size. A normal prior art SPUlet cansimply be a single executable file image that is to be loaded into asingle SPE. As applications expect more resources for execution,splitting these resources into multiple SPUlets is not efficient,particularly when such SPUlets need to be transferred across a network.

Thus, there is a need in the art, for a data structure having a largersized unit of migration so that cell processor applications can bepackaged and migrated to operate and interoperate across and in anetwork.

BRIEF DESCRIPTION OF THE DRAWINGS

The teachings of the present invention can be readily understood byconsidering the following detailed description in conjunction with theaccompanying drawings, in which:

FIG. 1 is a schematic diagram of a cell broadband engine architectureimplementing an extended SPUlet according to an embodiment of thepresent invention.

FIG. 2 is a schematic diagram of a cell processor an embodiment of thepresent invention.

FIG. 3 is a block diagram illustrating an extended SPUlet according toan embodiment of the present invention.

FIG. 4 is a flow diagram illustrating execution of an extended SPUletaccording to an embodiment of the present invention.

FIG. 5A is a block diagram illustrating memory allocation of an extendedSPUlet during a stage of execution.

FIG. 5B is a block diagram illustrating memory allocation of an extendedSPUlet during a different stage of execution.

FIG. 6 is a flow diagram illustrating network operation of cellprocessors using extended SPUlets according to an embodiment of thepresent invention.

FIG. 7 is a flow diagram illustrating an example of saving the state ofan SPU.

FIG. 8 is a block diagram illustrating the memory structure of suspendedstate information to be saved for a SPUlet that has been suspendedaccording to an embodiment of the present invention.

FIG. 9 is flow diagram illustrating an example of resuming operation ofan extended SPUlet that has been suspended.

FIG. 10 is a flow diagram illustrating illustrates a process resumptionof suspended execution of an SPE.

DESCRIPTION OF THE SPECIFIC EMBODIMENTS

Although the following detailed description contains many specificdetails for the purposes of illustration, anyone of ordinary skill inthe art will appreciate that many variations and alterations to thefollowing details are within the scope of the invention. Accordingly,the exemplary embodiments of the invention described below are set forthwithout any loss of generality to, and without imposing limitationsupon, the claimed invention.

In embodiments of the present invention, a cell processor can load,store and save information relating to the operation of one or more SPEof the cell processor in units of migration referred to herein asextended SPUlets. Unlike prior art SPUlets, an extended SPUlet accordingto embodiments of the present invention may include either two or moreSPU images or one or more SPU images and additional information relatedto operation of multiple SPU, e.g., shared initialized data. Generally,the shared data is shared by two or more SPE that execute the extendedSPUlet. To isolate the execution context, it is desirable to avoid PPUaccess to the shared data. However, the PPU may do so for the purpose ofmanagement, such as suspend and resume. Communication between theextended SPUlet and managing PPU can be done through a message box areaof memory specifically set up for that purpose. The extended SPUletprovides a larger grain size than prior art SPUlets. The extended SPUletcan address issues of setting up multiple SPEs, providing additionalmemory for shared initialized data, additional code, etc., and memorymapping between the SPEs and system main memory.

A cell processor may generally include four separate types of functionalcomponents: a PowerPC Processor Element (PPE), a Synergistic ProcessorUnit (SPU), a Memory Flow Controller (MFC) and an Internal InterruptController (IIC). The computational units in the CBEA-compliantprocessor are the PPE and the SPU. Each SPU must have a dedicated localstorage, a dedicated MFC with its associated Memory Management UnitMMU), and Replacement Management Table (RMT). The combination of thesecomponents is referred to as an SPU Element, (SPE). A cell processor maybe a single chip, a multi-chip module (or modules), or multiplesingle-chip modules on a motherboard or other second-level package,depending on the technology used and the cost/performancecharacteristics of the intended design point.

By way of example, and without limitation, FIG. 1 illustrates a type ofcell processor 100 characterized by an architecture known as CellBroadband engine architecture (CBEA)—compliant processor. A cellprocessor can include multiple groups of PPEs (PPE groups) and multiplegroups of SPEs (SPE groups) as shown in this example. Alternatively, thecell processor may have only a single SPE group and a single PPE groupwith a single SPE and a single PPE. Hardware resources can be sharedbetween units within a group. However, the SPEs and PPEs must appear tosoftware as independent elements.

In the example depicted in FIG. 1, the cell processor 100 includes anumber of groups of SPEs SG-0 . . . SG_n and a number of groups of PPEsPG_0 . . . PG_p. Each SPE group includes a number of SPEs SPE0 . . .SPEg. The cell processor 100 also includes a main memory MEM and aninput/output function I/O. One or more extended SPUlets 102 of the typesdescribed herein may be stored in the main memory MEM.

Each PPE group includes a number of PPEs PPE_0 . . . PPE_g SPE. In thisexample a group of SPEs shares a single cache SL1. The cache SL1 is afirst-level cache for direct memory access (DMA) transfers between localstorage and main storage. Each PPE in a group has its own first level(internal) cache L1. In addition the PPEs in a group share a singlesecond-level (external) cache L2. While caches are shown for the SPE andPPE in FIG. 1, they are optional for cell processors in general and CBEAin particular.

An Element Interconnect Bus EIB connects the various components listedabove. The SPEs of each SPE group and the PPEs of each PPE group canaccess the EIB through bus interface units BIU. The cell processor 100also includes two controllers typically found in a processor: a MemoryInterface Controller MIC that controls the flow of data between the EIBand the main memory MEM, and a Bus Interface Controller BIC, whichcontrols the flow of data between the I/O and the EIB. Although therequirements for the MIC, BIC, BIUs and EIB may vary widely fordifferent implementations, those of skill in the art will be familiartheir functions and circuits for implementing them.

Each SPE is made includes an SPU (SPU0 . . . SPUg). Each SPU in an SPEgroup has its own local storage area LS and a dedicated memory flowcontroller MFC that includes an associated memory management unit MMUthat can hold and process memory-protection and access-permissioninformation.

The PPEs may be 64-bit PowerPC Processor Units (PPUs) with associatedcaches. A CBEA-compliant system includes a vector multimedia extensionunit in the PPE. The PPEs are general-purpose processing units, whichcan access system management resources (such as the memory-protectiontables, for example). Hardware resources defined in the CBEA are mappedexplicitly to the real address space as seen by the PPEs. Therefore, anyPPE can address any of these resources directly by using an appropriateeffective address value. A primary function of the PPEs is themanagement and allocation of tasks for the SPEs in a system.

The SPUs are less complex computational units than PPEs, in that they donot perform any system management functions. They generally have asingle instruction, multiple data (SIMD) capability and typicallyprocess data and initiate any required data transfers (subject to accessproperties set up by a PPE) in order to perform their allocated tasks.The purpose of the SPU is to enable applications that require a highercomputational unit density and can effectively use the providedinstruction set. A significant number of SPUs in a system, managed bythe PPEs, allow for cost-effective processing over a wide range ofapplications. The SPUs implement a new instruction set architecture.

MFC components are essentially the data transfer engines. The MFCprovides the primary method for data transfer, protection, andsynchronization between main storage of the cell processor and the localstorage of an SPE. An MFC command describes the transfer to beperformed. A principal architectural objective of the MFC is to performthese data transfer operations in as fast and as fair a manner aspossible, thereby maximizing the overall throughput of a cell processor.Commands for transferring data are referred to as MFC DMA commands.These commands are converted into DMA transfers between the localstorage domain and main storage domain.

Each MFC can typically support multiple DMA transfers at the same timeand can maintain and process multiple MFC commands. In order toaccomplish this, the MFC maintains and processes queues of MFC commandsThe MFC can queue multiple transfer requests and issues themconcurrently. Each MFC provides one queue for the associated SPU (MFCSPU command queue) and one queue for other processors and devices (MFCproxy command queue). Logically, a set of MFC queues is alwaysassociated with each SPU in a cell processor, but some implementationsof the architecture can share a single physical MFC between multipleSPUs, such as an SPU group. In such cases, all the MFC facilities mustappear to software as independent for each SPU. Each MFC DMA datatransfer command request involves both a local storage address (LSA) andan effective address (EA). The local storage address can directlyaddress only the local storage area of its associated SPU. The effectiveaddress has a more general application, in that it can reference mainstorage, including all the SPU local storage areas, if they are aliasedinto the real address space (that is, if MFC_SR1[D] is set to ‘1’).

An MFC presents two types of interfaces: one to the SPUs and another toall other processors and devices in a processing group. The SPUs use achannel interface to control the MFC. In this case, code running on anSPU can only access the MFC SPU command queue for that SPU. Otherprocessors and devices control the MFC by using memory-mapped registers.It is possible for any processor and device in the system to control anMFC and to issue MFC proxy command requests on behalf of the SPU. TheMFC also supports bandwidth reservation and data synchronizationfeatures. To facilitate communication between the SPUs and/or betweenthe SPUs and the PPU, the SPEs and PPEs may include signal notificationregisters that are tied to signaling events. Typically, the PPEs andSPEs are coupled by a star topology in which the PPE acts as a router totransmit messages to the SPEs. Such a topology does not provide fordirect communication between SPEs. Instead each SPE and each PPE has aone-way signal notification register referred to as a mailbox. Themailbox can be used for SPE to host OS synchronization.

The IIC component manages the priority of the interrupts presented tothe PPEs. The main purpose of the IIC is to allow interrupts from theother components in the processor to be handled without using the mainsystem interrupt controller. The IIC is really a second levelcontroller. It is intended to handle all interrupts internal to aCBEA-compliant processor or within a multiprocessor system ofCBEA-compliant processors. The system interrupt controller willtypically handle all interrupts external to the cell processor.

In a cell processor system, software often must first check the IIC todetermine if the interrupt was sourced from an external system interruptcontroller. The IIC is not intended to replace the main system interruptcontroller for handling interrupts from all I/O devices.

There are two types of storage domains within the cell processor: localstorage domain and main storage domain. The local storage of the SPEsexists in the local storage domain. All other facilities and memory arein the main storage domain. Local storage consists of one or moreseparate areas of memory storage, each one associated with a specificSPU. Each SPU can only execute instructions (including data load anddata store operations) from within its own associated local storagedomain. Therefore, any required data transfers to, or from, storageelsewhere in a system must always be performed by issuing an MFC DMAcommand to transfer data between the local storage domain (of theindividual SPU) and the main storage domain, unless local storagealiasing is enabled.

An SPU program references its local storage domain using a localaddress. However, privileged software can allow the local storage domainof the SPU to be aliased into main storage domain by setting the D bitof the MFC_SR1 to ‘1’. Each local storage area is assigned a realaddress within the main storage domain. (A real address is either theaddress of a byte in the system memory, or a byte on an I/O device.)This allows privileged software to map a local storage area into theeffective address space of an application to allow DMA transfers betweenthe local storage of one SPU and the local storage of another SPU.

Other processors or devices with access to the main storage domain candirectly access the local storage area, which has been aliased into themain storage domain using the effective address or I/O bus address thathas been mapped through a translation method to the real address spacerepresented by the main storage domain.

Data transfers that use the local storage area aliased in the mainstorage domain should do so as caching inhibited, since these accessesare not coherent with the SPU local storage accesses (that is, SPU load,store, instruction fetch) in its local storage domain. Aliasing thelocal storage areas into the real address space of the main storagedomain allows any other processors or devices, which have access to themain storage area, direct access to local storage. However, sincealiased local storage must be treated as non-cacheable, transferring alarge amount of data using the PPE load and store instructions canresult in poor performance. Data transfers between the local storagedomain and the main storage domain should use the MFC DMA commands toavoid stalls.

The addressing of main storage in the CBEA is compatible with theaddressing defined in the PowerPC Architecture. The CBEA builds upon theconcepts of the PowerPC Architecture and extends them to addressing ofmain storage by the MFCs.

An application program executing on an SPU or in any other processor ordevice uses an effective address to access the main memory. Theeffective address is computed when the PPE performs a load, store,branch, or cache instruction, and when it fetches the next sequentialinstruction. An SPU program must provide the effective address as aparameter in an MFC command The effective address is translated to areal address according to the procedures described in the overview ofaddress translation in PowerPC Architecture, Book III. The real addressis the location in main storage which is referenced by the translatedeffective address. Main storage is shared by all PPEs, MFCs, and I/Odevices in a system. All information held in this level of storage isvisible to all processors and to all devices in the system. This storagearea can either be uniform in structure, or can be part of ahierarchical cache structure. Programs reference this level of storageusing an effective address.

The main memory of a system typically includes both general-purpose andnonvolatile storage, as well as special-purpose hardware registers orarrays used for functions such as system configuration, data-transfersynchronization, memory-mapped I/O, and I/O subsystems. There are anumber of different possible configurations for the main memory. By wayof example and without limitation, Table I lists the sizes of addressspaces in main memory for a particular cell processor implementationknown as Cell Broadband Engine Architecture (CBEA)

TABLE I Address Space Size Description Real Address Space 2^(m) byteswhere m ≦ 62 Effective Address Space 2⁶⁴ bytes An effective address istranslated to a virtual address using the segment lookaside buffer(SLB). Virtual Address Space 2^(n) bytes where 65 ≦ 80 A virtual addressis translated to a real address using the page table. Real Page 2¹²bytes Virtual Page 2^(p) bytes where 12 ≦ p ≦ 28 Up to eight page sizescan be supported simultaneously. A small 4-KB (p = 12) page is alwayssupported. The number of large pages and their sizes areimplementation-dependent. Segment 2²⁸ bytes The number of virtualsegments is 2(n − 28) where 65 ≦ n ≦ 80 Note: The values of “m,” “n,”and “p” are implementation-dependent.

The cell processor 100 may include an optional facility for managingcritical resources within the processor and system. The resourcestargeted for management under the cell processor are the translationlookaside buffers (TLBs) and data and instruction caches. Management ofthese resources is controlled by implementation-dependent tables.

Tables for managing TLBs and caches are referred to as replacementmanagement tables RMT, which may be associated with each MMU. Althoughthese tables are optional, it is often useful to provide a table foreach critical resource, which can be a bottleneck in the system. An SPEgroup may also contain an optional cache hierarchy, the SL1 caches,which represent first level caches for DMA transfers. The SL1 caches mayalso contain an optional RMT.

The foregoing is intended to provide an introduction and description ofthe terminology used in cell processor implementations. The foregoingdiscussion is also intended to set forth a context for data structuresand methods according to embodiments of the present invention. Suchembodiments are not limited to implementation on or with cell processorshaving the architecture described above. However, any or all of theembodiments described below may be implemented using such cellarchitecture as an environment in which extended SPUlets may beencountered and utilized.

FIG. 2 depicts an example of a cell processor 200 operating withextended SPUlets. For the purposes of illustration, the cell processorincludes a main memory 202, a single PPE 204 and eight SPEs 206.However, a cell processor may be configured with any number of SPE's.With respect to FIG. 2, the memory, PPE, and SPEs can communicate witheach other and with an I/O device 208 over a ring-type elementinterconnect bus 210. Extended SPUlets 212 may be stored in main memory202, transferred to other cell processors, e.g., via the I/O device 208and a network 214, or loaded piecewise into the various SPEs 206 thatmake up the cell processor.

As set forth above, the extended SPUlets 102, 212 generally include oneor more SPU images and additional data, such as uninitialized data orthey may include two or more SPU images. FIG. 3 illustrates thearrangement of data that can make up an extended SPUlet 300. This datamay include but is not limited to SPU images 302, share initialized data304, information relating to uninitialized data 306 and a message box308. The extended SPUlet 300 may optionally include a file header 310.

The SPU images 302 typically contain the contents of the local store ofan SPE in a cell processor. SPU images may be gathered from the SPEsduring processing with the cell processor. The SPU images may containdata that has been processed by the SPU, data to be processed by theSPU, and code for processing the data with the SPU. The SPU images 302may also contain data regarding a DMA state of the MFC and a hardwarestate of the SPE when the extended SPUlet 300 was suspended. Theinitialized data 304 is data, having established values that can bestored in main memory and/or shared amongst several SPE that areexecuting a particular process, depending on the configuration. Incontrast, uninitialized data has no pre-established value, butparameters regarding that data are known. For example, the informationrelating to uninitialized data 306 may refer to the type of data, sizeand location of memory space needed for that data. The message box 308is a window of memory where incoming and outgoing streams of data can beaccessed by the SPUs and PPU. The host operating system can providesystem service (communication socket, etc.) through the message box 308.The extended SPUlet 300 may also return information to the clientenvironment using the message box 308 as an interface.

The message box area 308 is used for communication between PPU andextended SPUlet 300. The message box area may be divided into multiplemessage boxes. Each box can be used for a single direction ofcommunication, e.g., extended SPUlet to PPE or PPE to SPE. The messagebox 308 could be configured as a single buffer or a ring buffer with amanagement area that is updated by a reader and writer for hand shaking.The format of information within the message box area 308 is up to theapplication, but there could be certain preset protocols. Such presetprotocols may be indicated in the file header 310.

By way of example and without limitation, the file header 310 couldindicate that the message box 308 is used for the extended SPUlet tocommunicate back to a client using certain protocols handled by thehost. Alternatively, the file header 310 may indicate that the messagebox 308 is used for an SPE to request system service to the PPU.Examples of such system service include requesting additional memory,opening a new network connection, and the like. Furthermore, the fileheader 310 may indicate that the message box 308 is used for the PPU torequest suspension of the extended SPUlet 300.

It is important to note that the particular contents of an extendedSPUlet depend on context. For example, when an extended SPUlet has beensaved to main memory, the image of the extended SPUlet 300 in systemmemory includes the SPU images 302, shared initialized data 304,information regarding uninitialized data 306 and message box 308. Thiscombination of data elements is referred to as the image of the extendedSPUlet in system memory. However, when the extended SPUlet 300 istransferred from a client device across a network to another cellprocessor (referred to herein as a host processor) a file header 310 iscombined with the SPU images 302 and initialized data 304. Thiscombination of data elements (referred to herein as a file image) iswhat is transferred.

The file header 310 header may include information that tells the hostcell processor about the extended SPUlet. The header information may becategorized as either Execution Information or Extended SPUletInformation. Execution Information may include Host resources,connection requirements, and other criteria describing the environmentin which the SPUlet should run. Extended SPUlet information describesthings like memory layout, mapping, start offsets and otherinitialization, message box configuration.

Such information may include, e.g., memory availability (i.e., how muchmemory is needed to run the extended SPUlet), SPU availability (i.e.,how many SPU are needed to run the extended SPUlet), network latency andbandwidth and system frequency requirements for the extended SPUlet,control flow information (e.g., whether the host or client machine hasthe right to interrupt and suspend the extended SPUlet), memory offsets,breakpoints of one or more SPU images, size of one or more SPU images,memory mapping information, message box layout, message box capabilitiesand the like. It should be understood that the header may also defineinformation in connection with a user, id, system, function, data type,channel, flag, key, password, protocol, target or profile or any metricin which system or operation may be established wherein such may relateto or be directed from the extended SPUlet and including but not limitedto configuration, initialization, modification or synchronization ofoperations involving any program or system or module or object thatsatisfies an overall goal of the application in which the extendedSPUlet operates to serve. Such applications may include security relatedapplications and protocols, encoding, decoding and transcodingapplications, transactions, etc. The file header 310 can be created bythe PPE just prior to transmission and transmitted with the SPU imagesand initialized data. Alternatively, the file header 310 may be part ofthe file image and sent as part of a stack transmission.

In general, an SPU cannot access privileged SPU control. Consequently itis often necessary for the extended SPUlet 300 to load each SPE withsuitable code, which can be started when loaded. Furthermore, in orderto communicate, the extended SPUlet 300 desirably includes memorymapping information that maps the SPEs involved to each other and to anyshared portion of main memory.

FIG. 4 illustrates a general method 400 for operating two or more cellprocessors over a network using extended SPUlets. The extended SPUlet istransferred as a file image from the client device to the host device atstep 402. Transfer of the file image between the host and client cellprocessors may take place across any network or bus, including but notlimited to secure and unsecure networks, local area networks (LAN), widearea networks (WAN), or a public network such as the Internet. In someembodiments, the client machine may send the file header 310 to the hostmachine before sending the rest of the extended SPUlet. The host machinecan analyze the information in the file header for acceptance criteria,e.g., whether the host machine or another device in which the SPUlet isdirected, is known or determined to have sufficient SPUs, securityclearance, rights, configuration, memory, etc. available to run theextended SPUlet. The host machine can then decide whether or not toaccept the extended SPUlet or pass the extended SPUlet to another deviceor the target machine in which the SPUlet is directed.

If the host machine accepts the extended SPUlet, it allocates systemmemory for the extended SPUlet at step 404. The host machine may use theinformation in the file header to allocate the size and data type for ablock of memory for the SPU images 302 and shared initialized data 304.Once the memory space has been allocated, the host processor can loadthe SPU images 302 and initialized data 304 of the extended SPUlet 300into the main memory of the host cell processor at step 406. The hostcell processor can then allocate an area for uninitialized data (if any)and a message box. It is preferred that memory is allocated in mainmemory of the PPU. However, specialized SPUlet application may configurememory in the PPU and/or in one or more SPU local stores, depending onspecialized SPUlet applications. Generally, memory is allocated in mainmemory to satisfy the extended reach memory requirements for complexprocessing, such as video transcoding. FIGS. 5A-5B illustrate theorganization of data for the extended SPUlet on the cell processor ofthe host device (the host cell processor). As shown in 5A, the hostprocessor received the file image containing the SPU images 302,initialized data 304 and file header 310. Typically, only the SPU images302 and initialized data 304 are stored in the host cell processor'smain memory. These form the main memory footprint of the extended SPUlet300. The data in the header 310 may be discarded once the host processoris finished with it.

At step 408, the host cell processor allocates an area in its mainmemory for uninitialized data 506 and a message box 508. As shown inFIG. 5A the combination of SPU images 302, Initialized data 304 and theareas allocated for uninitialized data 506 and the message box 508constitute the image in the host cell processors main memory for theextended SPUlet 300. At step 410, the host processor allocates SPEs 510(as shown in FIG. 5B) for the extended SPUlet 300. Once the SPEs 510 areallocated, the SPU images 302 are loaded into the allocated SPEs 510 atstep 412. The SPEs can then be run on the host cell processors at step414.

FIG. 6 illustrates additional examples of how extended SPUlets canmigrate among cell processors across a network. An extended SPUlet maybe created by any client. FIG. 6 shows an example of an SPUlet createdby a client cell processor 601. In this example, the client cellprocessor is running a process that uses two of its SPEs 602, 603. Theinstructions and data from are loaded from main memory 604 into SPU1 andSPU2 as indicated at 606, 608. SPU1 and SPU2 run at 610. The host cellprocessor's PPE 612 determines that it is necessary to interrupt at 614and signals SPU1 and SPU2 to stop. Such an interrupt may occur for anynumber of reasons. For example, the PPE may determine that there ishigher priority work that requires the SPEs 602, 603. Alternatively, theprocess may have proceeded to a point where it can be more efficientlycompleted elsewhere. In particular, the process may have proceeded to apoint where it is about to generate a large amount of data that is to betransferred to a host device. It may be more efficient in terms ofnetwork bandwidth to transfer the partially completed process to thehost device and let that device generate the data.

After the SPU1 and SPU2 are stopped local store contents of SPU1 andSPU2 are saved at 616, 618 to main memory 604 as SPU images 620, 622. At624 the PPE 612 creates a file image an extended SPUlet containing theSPU images 620, 622 and initialized data 626. The initialized data mayhave been created in system memory by SPEs 602, 603 the PPE 612. Thefile image may be created by bundling the SPU images 620, 622 with theinitialized data 626 and a file header as described above. At 628 thefile image is sent over a network 630 to a host cell processor 631.Assuming acceptance criteria in the file header have been met, the SPUimages 620, 622 and initialized data 626 are loaded into the host cellprocessor's main memory 634. From there, the SPU images 620, 622 can beloaded into the SPEs 632,633 of the host cell processor 631 as indicatedat 636, 638 and run as indicated at 640, 642. The SPEs 632, 633 maycontinue to run until they are finished as in any normal cell processingapplication. Upon completion the extended SPUlet returns status to theclient processor 601 and (optionally) notifies the host processor 631 ofthe completion. The operating system (OS) running on the host processor631 can then destroy (i.e., overwrite) the extended SPUlet image andassociated data in main memory 634.

Alternatively, at 644, the host cell processor's PPE 646 may interruptthe SPU operations on the SPEs 632, 633, e.g., to make them availablefor higher priority work. Once SPU operation has stopped, the SPU imagescan be saved to main memory as discussed above. The SPU images may bebundled with initialized data 648, code, etc. into a file image at 650.Alternatively, the PPE 646 can wait at 652 until SPEs become available.The SPUs can then resume SPU operation at 654, 656. Alternatively, thefile image can be exported at 658 over the network 630 to another host660 or back to the client processor 601.

In the preceding example SPU images 620, 622 were saved by the host cellprocessor 601 at 616, 618. Similarly, the process of creating the fileimage at 650 could involve saving SPU images on the host cell processor631. To allow migration of the extended SPUlet a capability forsuspending and resuming is desirable. Preferably, suspension involves acooperative yield of execution. In particular, the host OS notifies theextended SPUlet to suspend. The SPUlet then stops all DMA and SPEexecution, gracefully yields execution and notifies the host OS. Thehost OS can save the execution state of the extended SPUlet.

The flow diagram of FIG. 7 illustrates an example of a process 700 for aclient or host cell processor to save an execution state for an extendedSPUlet. For the purposes of illustration, this diagram shows the actionsof a cell processor's PPE 701 and one of its SPE 702. Those of skill inthe art will recognize that the same process can be expanded to savemultiple SPU images.

The PPU 701 stop execution of whatever process is miming on the SPE 702.For example, the PPU can write to a stop register of the SPE's SPU at703. The SPU core of the SPE 702 consequently stops at 704. In addition,it may be necessary to stop DMA activity on the MFC of the SPE 702.Specifically, the PPU 701 can write to a DMA STOP register of the MFC ofthe SPE 702 at 705 to stop the DMA at 706. Once DMA has stopped, at 707the PPU 701 can harvest the DMA state of the SPE 702. This can be doneby reading DMA registers containing information regarding the state ofDMA operation at the time SPU execution stopped at 706. This informationcan be stored in main memory as part of an extended SPUlet for the SPE702.

At 709 the PPU harvests the local state of the SPE 702, i.e., thecontents of the local storage (LS) of the SPE 702. This operation mayinvolve writing to an SPU register and DMA reading the LS contents viathe MFC. The LS typically contains both code and data, which are savedto main memory as part of the extended SPUlet, e.g., the SPU image.

It is often desirable to save the hardware state of the SPU, i.e., thevalues of the registers and channels as part of the extended SPUlet. Tosave this information as part of the SPU image, the PPE may have to sendan SPU SAVE code to the SPE 702 at 711. This operation may involve aregister write and a DMA write to transfer the code. The PPU may thenset the SPU's program counter at 713 and signal the SPU to run the SPUSAVE code at 715, e.g., by writing to the SPU's run register. The SPUstarts the SPU SAVE code at 708 reads the registers and channels thatmake up the hardware state at 710 and sends the hardware stateinformation to main memory at 712 as part of the extended SPUlet.

Saving the SPU image and other information is part of the process ofsuspending SPU operation. FIG. 8 illustrates an example of suspendedinformation 800 that can be saved as an extended SPUlet. In thisexample, a task executing on a single Cell system has been suspended andmade into an extended SPUlet that can migrate to another host. Theinformation 800 includes SPU images 802, shared information 804 such asinitialized data, additional code, information regarding uninitializeddata 806 and a message box 808 as discussed above. The precedinginformation constitutes the system memory image 801. The SPU images 802and shared information 804 may be combined with a file header 810 toform a file image 803. In addition the information 800 includes SPUimages 812 corresponding to runtime LS states 805.

The SPU images 802 in the file image 803 are what gets loaded when theSPUlet starts executing. The SPU images 802 do not have to be full LocalStorage size. They can load additional code from system memory on theirown. The suspended SPU images 812 are snapshots of the Local Storagestate, which has to be full Local Storage size and reflects the loadingand unloading of code and data that had been done up to the point it wassuspended.

The information 800 further includes SPE processor execution states 814(e.g., the hardware states and DMA states discussed above). By way ofexample, and without limitation, the processor execution states 814 mayinclude registers, channel state, MFC state, instruction pointer,decrementer, and floating point exception state. The execution states814 are separated because an extended SPUlet initially does not requiresuch information when it gets started. An extended SPUlet can assume afresh start with no context information required. A suspended SPUlet, bycontrast, needs to save all the hardware context information in order toresume execution.

In addition, the information 800 may include management information 816,such as connection information. At minimum, the host needs to store theinformation about the client such as IP addresses. The informationnecessary to resume execution and reestablish the connection with theclient needs to be passed to the host where the extended SPUlet migratesto. What gets included here is dependent on the authentication model ofthe migration.

Although the idea does not exclude such a scenario, in order for theSPUlet to be capable of migration, it should likely need to beconfigured at compile time to be a migratable SPUlet.

Since migration is moving a program from one runtime environment toanother, the program needs to be well isolated from anything else on thesystem. In embodiments of the invention a Cell-based distributed networkmay have all of its executable programs in the form of extended SPUletsto start with. In such a case if an extended SPUlet starts executinglocally, migration to another host only requires saving the context. Itis not necessary to dynamically create an extended SPUlet from arbitraryworking sets of SPE programs.

The preceding discussion of FIG. 6 mentions resumption of suspendedSPUlets at 654, 656. By way of example, and without loss of generality,the flow diagram of FIG. 9 illustrates a process 900 for resumption ofsuspended execution of an extended SPUlet. At 902 system resources, suchas SPEs, and main memory, message box, etc. are reallocated to run theextended SPUlet. At 904 a main memory portion of the saved information,e.g., the SPU local store runtime images are loaded into SPEs. At 906the SPE execution states are restored and execution of the SPEs resumesat 908.

By way of example, and without loss of generality, the flow diagram ofFIG. 10 illustrates in further detail a process resumption of suspendedexecution of an SPE 1002 of a cell processor 1001. Those of skill in theart will recognize that the process depicted in FIG. 10 may be expandedto include resumption of suspended execution of multiple SPEs. In thisexample, a main memory 1004 of a cell processor 1001 is loaded with anextended SPUlet 1006, e.g., a file image, containing an SPU hardwarestate 1008, an SPU local store image 1010 and a DMA state 1012. Theextended SPUlet 1006 may have been stored as a result of interruptionand suspension of a process running on the cell processor 1001 or it mayhave been imported from another cell processor. In either case it isassumed for the purposes of this example that the SPE 1002 is stopped at1014.

The PPU 1016 of the cell processor 1001 sends a hardware state loaderprogram 1018 to the SPE 1002. This operation may involve a DMA write tothe LS of the SPE 1002 and a register write to the SPU of the SPE 1002to run the hardware state loader program at 1020. At 1022, under controlof the hardware state loader program 1020, the SPE 1002 loads the SPUhardware state 1008 from the extended SPUlet 1006 stored in main memory1004 and executes a STOP and SIGNAL instruction. Under this instruction,execution of the program in the SPU stops, and the external environment(e.g., the PPU 1016) is signaled. No further instructions are executed.At 1024 The PPU 1016 then loads the SPU image 1010 from main memory 1004to the local store of the SPE 1002. At 1026, the PPU 1016 loads the DMAstate 1012 from main memory 1004 to SPE local store, e.g., by writing toappropriate registers.

The PPU 1016 sends a DMA start command to the SPEs MFC at 1028 to startDMA operation. This may involve writing to an MFC start register. DMAruns commence at 1030. The program counter is set at 1032. The PPU 1016sends an SPU run command at 1034, e.g., by writing to an SPU runregister. The SPU then runs at 1036, e.g., starting from the point whereoperation had been suspended.

Note that the process for initially loading an extended SPUlet thehardware state filling and DMA state loading steps is essentially asdescribed above with respect to FIG. 10 except that the hardware stateand DMA state loading sequences may be eliminated. For an initial load,there is generally to DMA or hardware state to restore.

While the above is a complete description of the preferred embodiment ofthe present invention, it is possible to use various alternatives,modifications and equivalents. Therefore, the scope of the presentinvention should be determined not with reference to the abovedescription but should, instead, be determined with reference to theappended claims, along with their full scope of equivalents. Any featuredescribed herein, whether preferred or not, may be combined with anyother feature described herein, whether preferred or not. In the claimsthat follow, the indefinite article “A”, or “An” refers to a quantity ofone or more of the item following the article, except where expresslystated otherwise. The appended claims are not to be interpreted asincluding means-plus-function limitations, unless such a limitation isexplicitly recited in a given claim using the phrase “means for.”

What is claimed is:
 1. A method for operating a client processor and ahost processor over a network, wherein the client processor has a clientmain memory, a client main processor and two or more client secondaryprocessors and the host processor has a host main processor, a host mainmemory and two or more host secondary processors, wherein each clientsecondary processor and each host secondary processor has an exclusivelyassociated memory, the method comprising: running a process on a clientprocessor wherein the process runs on two or more client secondaryprocessors; suspending the process that runs on the two or more clientsecondary processors; saving an execution state of the process that runson the two or more client secondary processors by either a) saving anexecution state of two or more client secondary processors running theprocess on the client processor into a single file in the main memory,wherein the execution state includes contents of the exclusivelyassociated memories for the two or more client secondary processors, anda hardware state of the two or more client secondary processors thatwere miming the process, wherein the contents of the exclusivelyassociated memories for the two or more client secondary processorsincludes executable code for miming the process, or b) saving anexecution state of one or more of the client secondary processors thatwere running the process that runs on the two or more client secondaryprocessors and shared initialized data for the process that runs on thetwo or more client secondary processors into a single file in the mainmemory, wherein the execution state includes contents of the exclusivelyassociated memory for at least one of the client secondary processorsand shared initialized data of the suspended process and a hardwarestate of the at least one of the client secondary processors that wererunning the suspended process, wherein the contents of the exclusivelyassociated memory of the at least one of the client secondary processorsincludes executable code for running the process; and transferring thesingle file from the client processor over a network to the hostprocessor; wherein the single file containing the saved execution stateof the process is configured to permit the host processor to resumeprocessing of the suspended process.
 2. The method of claim 1, furthercomprising allocating space in a main memory of a host processor (hostmain memory) for the contents of the exclusively associated memory ofthe one secondary processor and initialized data.
 3. The method of claim1 further comprising, allocating an area in a main memory of a hostprocessor (host main memory) for uninitialized data and a message box.4. The method of claim 1, further comprising loading the file onto thehost processor.
 5. The method of claim 1 wherein saving the stateincludes stopping a core of the secondary processor miming the process.6. The method of claim 1 wherein saving the state further includessending a save code to the secondary processor and running the code onthe secondary processor.
 7. The method of claim 1, further comprisingresuming the process on a secondary processor of the host processor. 8.The method of claim 1, further comprising transferring a file headerfrom the client processor to the host processor.
 9. A non-transitoryprocessor readable medium having embodied therein executable code anddata representing a saved execution state of a processor system having amain processing unit, two or more secondary processors, and a mainmemory coupled to the main and secondary processors, wherein eachsecondary processor has an exclusively associated memory, the savedexecution state comprising: a single file containing either a) contentsof the exclusively associated memory for one of the two or moresecondary processors and shared initialized data related to a suspendedprocess execution state, wherein the contents of the exclusivelyassociated memory and shared initialized data include executable codeconfigured to run the suspended process on the processor system or b)contents of exclusively associated memories of two or more secondaryprocessors related to the execution state of a suspended processincluding executable code for processing data with the two or moresecondary processors and data to be processed by executing the code withthe two or more of the secondary processors, wherein the contents of thetwo or more of the exclusively associated memories include executablecode configured to run the suspended process on the processor system;wherein the single file containing the saved execution state of theprocess is configured to permit a different processor system to resumeprocessing of the suspended process.
 10. The processor readable mediumof claim 9 wherein the file further comprises a file header.
 11. Theprocessor readable medium of claim 10 wherein the file header includesone or more of the following types of information: memory availability,secondary processor availability, network latency, network bandwidth,system frequency, control flow information, memory offsets, breakpointsof contents of one or more exclusively associated local memories forsecondary processors, size of contents of one or more exclusivelyassociated local memories for secondary processors, memory layout,memory mapping information, host resources, connection requirements, andother criteria describing the environment in which the file should run.12. The processor readable medium of claim 10 wherein the file headerdefines information in connection with a user, id, system, function,data type, channel, flag, key, password, protocol, target or profile orany metric in which system or operation may be established wherein suchmay relate to or be directed from the file.
 13. The processor readablemedium of claim 10 wherein the file header defines information relatedto configuration, initialization, modification or synchronization ofoperations involving any program or system or module or object thatsatisfies an overall goal of the application in which the file operates.14. The processor readable medium of claim 11 wherein the control flowinformation includes information regarding whether a host or clientprocessor can interrupt the process.
 15. A processor system having amain processing unit, two or more secondary processors and a main memorycoupled to the main and secondary processors, wherein each secondaryprocessor has an exclusively associated memory, the processor havingembodied in the main memory or exclusively associated memory datarepresenting a saved process execution state of a suspended process, thesaved process execution state comprising: a single file containingeither a) contents of an exclusively associated memory of a secondaryprocessor and shared initialized data related to a suspended processexecution state of a different processor system, wherein the contents ofthe exclusively associated memory includes executable code for runningthe suspended process on the processor system, or b) contents ofexclusively associated memories of two or more secondary processorsrelated to the saved process execution state, wherein the contents ofthe two or more secondary processors includes executable code forrunning the suspended process on the processor system; wherein thesingle file containing the saved execution state of the process isconfigured to permit a different processor system to resume processingof the suspended process.
 16. The processor system of claim 15 whereinthe file further comprises a file header.
 17. The processor system ofclaim 16 wherein the file header includes one or more of the followingtypes of information: memory availability, secondary processoravailability, network latency, network bandwidth, system frequency,control flow information, memory offsets, breakpoints of contents of oneor more exclusively associated local memories for secondary processors,size of contents of one or more exclusively associated local memoriesfor secondary processors, memory layout, memory mapping information,host resources, connection requirements, and other criteria describingthe environment in which the file should run.
 18. The processor systemof claim 16 wherein the file header defines information in connectionwith a user, id, system, function, data type, channel, flag, key,password, protocol, target or profile or any metric in which system oroperation may be established wherein such may relate to or be directedfrom the file.
 19. The processor system of claim 16 wherein the fileheader defines information related to configuration, initialization,modification or synchronization of operations involving any program orsystem or module or object that satisfies an overall goal of theapplication in which the file operates.
 20. The processor system ofclaim 17 wherein the control flow information includes informationregarding whether a host or client processor system can interrupt theprocess.